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Abstract 

Previous efforts to investigate the equivalence of rating sources for 
30b analysis ratings have reported conflicting results. In the present 
research, correlational and generalizability analyses were conducted to 
examine the equivalency of rating sources for over 70 state civil service 
30b classifications. Incumbent and supervisor ratings were examined, along 
with trained experts' ratings derived from narrative job descriptions. 
Separate suirvey instruments were individually developed to obtain objective 
skills and abilities for each job classification. Pearson correlation 
coefficients reported significant reliability for ratings across each rating 
source, and significant convergent validities across sources. Results of 
the generalizability analyses were inconclusive concerning the similarity of 
ratings, however variance attributed to rating sources was negligible. 
Additional credence is attributed to Smith and Hakel's (1979) belief that 
rating source does not make any practical difference in job analysis 
ratings. 

Literature Review 

The purpose of this study was to investigate the effects of different 
sources of job analysis ratings. The few studies that have examined such 
effects have used the traditional correlational approach (Burt, 1980); this 
study utilised the correlational approach and is the first to apply 
generalizability theory to such comparisons. Specifically this paper 



3 

- 1 - 



Rater Source 



attempts to examine the effects of rater source by (1) comparing 
correlational and generalizability approaches, (2) correlating previously 
inappropriate applications of the correlational approach with a prescribed 
nethod for future analyses, and (3) utilizing a large number of raters and 
different job classifications to increase generalizability of these 
findings. While several studies in the past • have investigated the 
reliability of ratings, few have systematically examined the effects of 
different sources of ratings, and the more general concern over the 
equivalency of rating sources (Burt, 1980). 

The reliability and validity of 30b ratings is essential, since 
occupational choice and organization choice decisions rely of the accuracy 
of the information available to the decision maker. Smith and Hakel (1979) 
indicate that there is no practical difference between rating sources, 
including naive raters, in terms of reliability and convergent validity. In 
comparing the ratings on several dimensions of occupations, with the 
Position Analysis Questionaire (FAQ), they found mean reliability 
coefficients ranged from .49 to .63 for each rating source. Much of their 
support for the equivalence of rating sources is based on the reported high 
convergent validities, i.e., the average correlation between pairs of raters 
from two different rating sourcer^^ (X = .92). 

However, Cornelius III, Denisis, and Blencoe (1984) note that 
individual jobs are the object of measurement in such a study, and 
reliabilities and convergent vali.dities must be obtained at the level of the 
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individual job, not across jobs as conducted by Smith and Hakel (1979). They 
used a limited 9 job replication study with a sample of only 13 raters with 
the corrected method and lower convergent validity <X r =*58) resulted. 

Several other attempts to measure different rater sources (Jones, Main, 
Butler & Johnson, 1982; Burt, 1980) have only continued to confuse the 
oicture concerning a "shared stereotype. " One limitation of previous 
studies is their almost exclusive use of the PAQ as the measurment 
instrument. Properties of the PAQ may artifically increase reliability and 
Validity among raters (Smith & Hakel, 1979). Large numbers of items scored 
as "Does Not Apply" for specific jobs may have increased validities in these 
studies. This study attempted to investigate the equivalency oif rater 
sources with an ability-oriented job analysis instrument. 

Methods 

Seventy job classifications that are listed in the professional or 
technical areas .of the Illinois Civil Service System were utilized for this 
study. Each job was classified in one of seven subgroups based on job 
content. 

Three rater sources were employed: incutibents, supervisors, and ratings 
of narrative job descriptions by expert raters. Initially, expert raters, 
comprised of three mid-level administrators at the Personnel Services Office 
of a major Illinois University, were trained in identifying objective 
abilities and skills required of several out-of-date job classifications. 
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Experts rated each of the 70 30b classifications, identifying skills and 
abilities needed for each job. Deticriptive phrases for each of four (0-3) 
points were developed through consensus agreement to help "anchor" the 
scale. Based on this initial analysis, separate surveys were developed for 
each 30b classification that contained items measuring each skill and 
ability required. Therefore, the number of items in each survey differed. 
All incumbents and supervisors employed in these job classifications were 
sent these surveys. The return rate for these surveys exceeded SOtc, with a 
sample of 697 incumbents and supervisors across all jobs. The mean number 
of surveys returned for each job by incumbents and supervisors was 6.79 and 
3.07, respectively. Each job classification was also analyzed . by the three 
expert raters, providing a mean of 13 ratings per job. 

Results 

Reliability of the job analysis ratings for each rating source were 
computed at the level of the individual job. All possible paiiv^ise 
correlations between raters within each source were computed and averagi^d 
for each job. The mean and median correlations for each of the seven job 
deatent areas, and the total of all 70 jobs were computed in a 3imilair 
fashion. However, mean values for each job were weighted by the number of 
raters and items (on each -specific survey) . This weighting procedure was 
required sinco different numbers of raters and different numbers of items 
were used for jobs classifications. Pearson correlation coefficients across 
all jobs were .579 for narrative ratings, ,466 for incumbents, and .440 for 
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supervisors. Median correlations were .540 for narrative ratings, .543 for 
incumbents, nnd .bS2 for supervisors. All three ratings sources were 
significant (p<.001) with mean and median figures* While these correlations 
are moderately smaller than those obtained from earlier studies (McCormick, 
et. al., 1972? Burt, 1980) they are relatively similar to cori'-elations in 
studies investigating equivalency of rating source and narrative ratings 
(Jones, €it. al, 1982; Smith and Hakel, 1979; Cornelius III et.al., 1984). 

Convergent validities were computed in a sirrular fashion, with all 
pairwise correlations between rating sources computed and then averaged for 
each job. These convargent validities ranged from .477 to .518 using Smith 
and Hakel's (1979) method for comparison. Table 1 illustrates the 
reliabilities and convergent valioities for each of the seven content areas 
of 3obs. Differences between me:;hods across jobs were not substantial. 



Generalizability analysis was used to examine the relative amount of 
variance contributed by each of the factors used in this study (i.e-, items, 
rating source, raters within rating source, interactions). Separate 
generalii:ability analyses were performed for each of the 70 job 
classifications using an unbalanced fixed effects, items crossed by raters 
who are nested within rating method, analysis of variance (ANOVA) model. 
Separate analyses were us3d to ^llow detailed examination of the 
intitrument's ability to match an individual with each job classification. 
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Generalizability coefficients, estimated variance components, and error 
terms for each unbalanced design, with different raters and items, were 
computed through procedures outlined by Brennan (1982, p. 111). While only 
10 of 70 generalizability coefficients (p ) were within the criteria 
suggested by Cardinet et al.,(1982) of .80. over half of the coefficients 
exceeded .5b (ranging from .26 to . 87). 

Estimated variance components show that rating source was not a 
significant factor in the ratings (mean o rs/RS= .0244) . Table 2 illustrates 
the average estimated variance components derived from the generalizability 
analyses across the 70 jobs. Estimated variance components for items and 
jobs accounted for the greatest proportion of variance. However, 
unaccounted for error variance was also substantial for some ]obs. 
Idiosyncratic differences in the interpretation of items by individual 
raters across rater sources could explain this error variance. The primary 
method for reducing this would involve instituting some type of rater 
training. Because variance contributed by rater source, and raters are so 
small across analyses, replications of this study would probably not lead to 
significantly different estimates for these sources (Doverspike, Carlisi, 
Barrett, & Alexander, 1983). 



Insert Table Two About Here 
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Discussion 

Results indicate that rating source does not have a significant effect 
on 30b analysis ratings when using an ability-oriented instrument. Ratings 
provided by incumbents, supervisors, and experts (based on narrative 
descriptions) were similar in 65 of 70 analyses (d* rs/RS <.07). However, 
this study does acknowledge that the true equivalency of rating sources has 
not been met when using strict measurment criteria (Gulliksen, 1968). 

Consistent results derived from the generalizability analyses suggest 
that rating source does not contribute significant variance in job analysis 
ratings. The correlations and convergent validities, within and between 
sources, support this statement. Higher reliabilities have been 
occassionally obtained with the PAQ, however these may be caused by 
excessive use of "does not apply items", leniency error, job level, ^ind 
generic design of the instrument. This study examined specific abilities in 
jobs with subtle differences (Accountant 1 vs Accountant 2, etc.). 

In summary, no practical difference can be found between rating 
sourcets. The importance of this study lies in the findings that accurate 
job analytic data can be obtained from several sources for a wide variety of 
jobs. Narrative job descriptions provided an accurate source of job uaca 
within these strictly controlled conditions. Further studies might do well 
to address the "shared stereotype" explanation for rater source siinilarities 
with instruments other than the PAQ. Further, investigation of the 
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equivalency of rating aourcea with generalizability theory, and Multivariate 
analyses is needed in areas within industrial/organizational psychology. 
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Table 1 

Interrater reliability by rater source and job content area (correlations 
were calculated for each individual job. Mean correlations were then 
weighted by items rated^ and number of pairwise correlations). 



Job Content Area 


Incunb 


Super. 


Raters 


R X S 


R X I 


S X I 


Mean r 1 . ) Business 


.435 


.409 


.562 


.442 


.486 


.405 


N 


958 


458 


23 


286 


510 


1270 


Sig.L 


.001 


.001 


.001 


.001 


.001 


.001 


Mean r 2.) Commun. 


.538 


.717 


.718 


.696 


.569 


.669 


N 


112 


49 


14 


80 


194 


109 


Sig. L 


.001 


.001 


.001 


.001 


.001 


.001 


Mean r 3.) Science 


.444 


.572 


.544 


.431 


.503 


.489 


N 


352 


106 


16 


164 


201 


224 


Sig. L 


.001 


.001 


.001 


.001 


.001 


.001 


Mean r 4.) Technical 


• .630 


.647 


.574 


.613 


.634 


.569 


N 


231 


37 


14 


102 


221 


199 


Sig.L 


.001 


.001 


.001 


.001 


.001 


.001 


Mean r 5.) Education 


.536 


.895 


.564 


.661 


.595 


.756 


N 


35 


6 


8 


41 


78 


4d 


Sig.L 


.001 


.001 


.001 


.001 


.001 


.001 


Mean r 6.) Arts 


.610 


.483 


.965 


.660 


.733 


.425 


N 


62 


22 


7 


54 


91 


86 


Sig.L 


.001 • 


.001 


.001. 


.001 


.001 


.001 


Mean r 7.) Health 


.417 


.528 


.173 


.133 


.284 


.427 


N 


1939 


322 


21 


251 


525 


1638 


Sig.L 


.001 


.001 


NS 


NS 


.01 


.001 




Mean r TOTAL 


.466 


.440 


.579 


.477 


.518 


.432 


H 


511 


206 


18 


176 


336 


616 


Sig. L 


.001 


.001 


.001 


.001 


.001 


.001 



N - mean number of pairwise correlations computed between raters 



X n for each job. 
n = mean number of items rated by each pair of raters. 



Table 2 

Average Estimated V. riance Components for Generalizability 
Analysis of 70 Job Classifications » 



Variance Coin(ionent 


MS 


DF 


0 (jtflS) 


Items 


6.709 


15.44 


.548 


Rater Source 


1.585 


2 


.024 


Raters within Source 


1.442 


9.88 


.088 


Items X Rater Source 


.572 


30.88 


.039 


Raters within Source 


.427 


145.86 


.427 


X Items 









» An average of estimated variance components. 



12 



